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(54) MPEG-2 decoding with a reduced RAM requisite by ADPCM recompression before storing 
MPEG-2 decompressed data optionally after a subsampling algorithm 



(57) The video memory requisite of an MPEG 
decoder effecting a decompression of the I, P and 
optionally also of the B picture according to the MPEG 
compression algorithm and requiring the storing in 
respective buffers organized in said video memory of 
the respective MPEG- decompressed data, may be 
dynamically reduced by subsampling and recompress- 



ing according to a ADPCM algorithm at least the data 
pertaining to the I and P pictures before coding and 
storing them in the respective buffers. Subsequently, the 
stored data are decoded, decompressed and upsam- 
pled for reconstructing blocks of pels to be sent to a 
macroblock-to-raster scan conversion circuit. 



VIDEO DECODER 
BLOCK DIAGRAM 

MR ss Memory Reduction 
MEMORY DATA BUS 



CONTROL BUS 




O7-D0* A5-A0 CS fVW WAITIRO 



00 
O) 



00 

o 

Q_ 
LU 



Printed by Xerox (UK) Business Services 
2.15.7/3.4 



EP 0 817 498 A1 



PAL 


720x576x8 for the luma (luminance) (Y) 
360x288x8 for the U chroma (chrominance U) 
360x288x8 for the V chroma (chrominance V) 


3.317.760 bits 
829.440 bits 
829.440 bits 


= 4.976.640 bits 


NTSC 


720x480x8 for the luma (luminance) (Y) 
360x240x8 for the U chroma (chrominance U) 
360x240x8 for the V chroma (chrominance V) 


2.764.800 bits 
691.200 bits 
691.200 bits 


= 4.147.200 bits 



Therefore, in a PAL system, which representing the most burdensome case, may serve as a reference example, 
the actual total amount of memory required will be given by: 
15 1 .835.008 + 835.584 + 4.976.640 + 4.976.640 + (4.976.640* 0.7407) = 1 6.310.070 bits. 
This calculation considers a 0.7407 optimization of the B-picture frame buffer. 

A further optimization may consist in undertaking the decompression of the B-picture without resorting to a storage 
step in the external RAM by carrying out an equivalent function internally in the integrated decoder device by a dedi- 
cated circuit block functionally placed upstream of the Display Unit. 

20 Considering this further optimization, the video RAM requirement drops to: 
1 .835.008 + 835.584 + 4.976.640 + 4.976.640 = 12.623.872 bits. 

Where the B-buffer is realized within the same chip containing the "core" of the decoder being required to convert 
the scanning of each 8*8 block, defined in the MPEG-2 compressed data stream, in that of each row of the picture (field 
or frame) required by the video display process of the picture itself. Such conversion macrocell is commonly called 

25 "MACROBLOCK TO RASTER SCAN CONVERTER". 

In the prior European patent application No 96830106.9, filed on 1 1 March 1996. in the name of the same appli- 
cant, date that is herein claimed as a priority date for the common disclosure, a method and relative device were 
described that allowed for a remarkable reduction of the above cited video memory requisite to about 8Mbits. 

The idea behind the invention that was described and claimed in the above identified previous patent application, 

30 is the recognition that the amount of memory required by the MPEG decoding process as stated in the above calcula- 
tions, can be remarkably reduced when allowing for a recompression of the pictures used as a reference for the predic- 
tion (l-picture and P-picture for the case of the standards MPEG-1 and MPEG-2), after the MPEG decompression and 
before they are temporarily stored in the external video memory and their decompression when they are read from the 
external memory. i 

35 It has now been discovered and represents the object of this new patent application that, the video memory requi- 
site may be synergically minimized, reducing it in practice to only 4Mbits, by carrying out a data subsampling, prior to 
the ADPCM recompression used to reduce to 8Mbits the memory requisite, of at least the I and P images after MPEG 
decompression and before coding and storing these data in the respective video memory buffers. Subsequently the 
decoded data, decompressed by blocks of pals during the reconstruction phase and before being sent to a "MACRO B- 

40 LOCK TO RASTER SCAN" conversion unit, are upsampled congruently to the subsampling factor used before their rec- 
ompression. 

It has been found that the degrading of the picture quality (picture definition) following to this subsampling operation 
and subsequent data upsampling, remains approximately within limits that are hardly noticed when the picture is dis- 
played on a TV. screen. 

45 Furthermore, the "core" architecture of the data processor, according to the present invention, allows for a simpli- 
fied implementation of proper means that select a mode of full reduction of the video memory requisite by carrying into 
effect such subsampling and upsampling operations, or disable this optional mode of minimization of the memory req- 
uisite retaining a higher picture quality with a video memory requisite of 8Mbits, which remains by far lower than the 
memory requisite of prior known systems. 

so Basically, this invention permits the implementation of an "adaptive" system for automatically managing video 
memory resources in order to optimize the performance of the device "core" in function of the operating conditions and 
accordingly of the prevailing memory requisites. 

In practice, the ADPCM recompression factor may be varied in such a way of optimizing the space available within 
the external video memory. 

55 If the memory space is enough to the purpose, the memory requisite reduction macrocell, object of the present 
invention, is bypassed, preserving the full quality of the picture as produced by the MPEG decoder system. 

In case of insufficient memory, the system implements the reduction to the 8Mbit memory requisite algorithm 
described in the preceding application No. 96830106.9 for recompressing at least the I and P pictures. Therefore, 
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iable length decoding block, comprising a "run-length" type decoding stage, a circuit undertaking an inverse quantiza- 
tion function, a processing circuit of the inverse discrete cosine transform (l_DCT) and a predictor value generation 
network, and is characterized by further comprising: 

s - a buffer for data macroblooks; 
a quincunx subsampling circuit; 

a circuit for coding and recompressing, according to an adaptive differential pulse coding modulation (ADPCM) 
scheme, that decompresses the I and P pictures, encoding the l_DCT block output data that, after motion compen- 
sation, are written in the respective buffers by the external memory; 
w - a circuit for decompressing and decoding the output (ADPCM) data from the l_DCT block relative to the I and P 
pictures so recompressed, read from the respective buffers of the external memory, capable of generating a video 
data stream relative to the I and P pictures, together with the outcoming data from the l_DCT relative to the decom- 
pressed B-pictures; 

a buffer for such video data stream; and 
is - an upsampling circuit of said video data stream. 

According to an embodiment of the invention, the coding and recompressing circuit may comprise: 

an acquisition buffer of the decompressed l-DCT data produced by the MPEG decompression block; 
a circuit that assesses the energy content of the buffer and generates a digital variance value of the pel values of 
the different data blocks output from the l_DCT block to be stored in the respective buffer of the external memory; 
a multilevel quantizer, coherently conditioned by the actual or current digital variance value generated by said cir- 
cuit; 

a differentiator capable of receiving through a first input the l-DCT data stream produced by the MPEG decompres- 
sion block and, through a second input, a predictor value and of producing an output data stream to be sent to the 
input of said quantizer; 

a coding and write circuit of the recompressed data in the respective buffers of the external memory capable of 
receiving as input the output stream of the quantizer; 

a network for the generation of said predictor value comprising a multiplexer capable of receiving through a first 
input the l_DCT input data stream and through a second input the predictor value generated by the network; 
an adder capable of receiving through a first input the output stream of the quantizer, through a second input the 
data output by said multiplexer and of producing an output stream of sum data; 

a limiter circuit capable of receiving as an input said sum data stream produced by said adder and followed in cas- 
cade by a circuit that generates said predictor value which is supplied to the second input of the differentiator stage 
and of the multiplexer. 

The decompressing and decoding circuit can be constituted by a decoding circuit capable of receiving through a 
first input a compressed and coded data stream coming from the respective external memory buffers and through a 
second input the relative variance value previously stored in the same external memory buffers and by a decompres- 
40 sion network made up of an adder summation stage capable of receiving through a first input the decoded data stream 
output by said decoding circuit and through a second input the predictor value relative to the decompressed pel value, 
already generated at the output of the adder, followed by a limiter of the pixel values. 

Of course, the dimensions in pels of the luminance and chrominance blocks of data, the format of the l-DCT data 
according to the MPEG compression scheme, the format of the recompression data of the already decompressed I and 
45 P pictures according to the ADPCM scheme, as contemplated by the invention, as well as the format of the estimated 
digital variance value and the number of levels of the relative quantizer can be different from those indicated by way of 
example in the present description and will normally be defined based on design preferences of the video decoder or 
receiver. 

The various aspects and relative advantages of the invention will be even more evident through the following 
so description of an important embodiment and by referring to the attached drawings, wherein: 

Figure 1 is a block diagram of the "core" of a video decoder according the present invention; 

Figure 2 shows a detail of the MR Encoder/Decoder of the general scheme of Fig. 1 ; 

55 

Figure 3 is a scheme of the bufferization scheme, subsampling, recompression and ADPCM coding; 
Figure 4 shows a possible scheme of the quincux subsampling block; 
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video memory is optimized by taking into account the need of storing words of several bits less than 16 which is the 
usual number into which is organized the external video memory. This opportunity of optimizing is no longer needed 
when the system enables a reduction of the full memory requisite to about 4Mbits by enabling the subsampling algo- 
rithm before the ADPCM recompression. 
s In the preferred case of a "direct" reconstruction of the B-pictures this is then realized as follows: 

- the ADPCM compressed I and P predictors are read from the external DRAM memory and ADPCM decompressed 
in order to perform motion compensation of the B-picture that is currently being MPEG decompressed by the "pipe- 
line", and subjected to a median subsampling procedure. 

10 

The macroblocks of l-DCT data so reconstructed are sent to the conversion circuit "MACROBLOCK TO RASTER 
SCAN", upstream of the display Unit of the diagram shown in Fig. 1 and they are then displayed. 

This procedure does not require any buffer in the external memory destined to store the B-picture because such a 
buffer is present in the macrocell "MACROBLOCK TO RASTER SCAN CONVERTER B-PICTURE". 
is In line with a fundamental aspect of the system of this invention characterized by its adaptability to changing con- 
ditions of video memory availability (for example by supposing of having available a 16Mbits external memory), the sys- 
tem is perfectly able of deactivating the algorithm that reduces the memory requisite through ADPCM recompression, 
coding and eventual writing in the video memory or otherwise to activate it. 

This operation is implemented by controlling two multiplexers through the microprocessor. 
20 The enabling/disabling scheme of the function that reduces the memory requisite in an adaptive manner through 
the controlling microprocessor, is illustrated in Fig. 2. In this partial view are shown in detail the two multiplexers con- 
trolled by the microprocessor that perform the activation or deactivation of the ADPCM recompression system through 
the MR Encoder and MR Decoder blocks. 

The detailed view of the MR Encoder/Decoder block of Fig. 1 also shows an Embedded MR Memory for the opti- 
cs mization of the external memory management, in line with the cited previous European Patent application. 

A feature of the bufferization, "quincunx" subsampling and recompression block of Fig. 1 relative to the data output 
by the l-DCT block (for the sake of brevity called l-DCT data) pertaining to the decompressed I and P pictures, is shown 
in Fig. 3. 

The diagram of Fig. 3 also includes two multiplexers MUX to automatically actuate the selection between a reduc- 
30 tion of the memory to only 4Mbits through a data subsampling before ADPCM recompression or to 8Mbits through 
ADPCM recompression alone, in practice by disabling the subsampling algorithm. 

The quincunx subsampling realized within the homologous block, may take place by discarding one every two pix- 
els belonging to each luminance data of a video line (row) to be written in the RAM. The process is repeated line by line 
thus "offsetting" the position of a pixel. 
35 As diagrammatically shown in Fig. 4, the quincunx subsampling may be simply realized by directing the pixels 
stream through an array of D -flip/flops, functioning in parallel, driven by a hart-frequency clock if compared to the fre- 
quency of the pixels stream. This half-frequency clock may be selected by a multiplexer MUX, at the input of which are 
applied the complementary clock signals of F/2 frequency if compared to the basic frequency F of the pixels steam 
through the flip-flop bank. The signal of line parity commands the clock selection thus producing a quincunx subsam- 
40 pling grid. 

The quincunx subsampling technique assumes that upstream of the subsampling stage in relation to the digital 
data stream, a frequency reduction of the band of the video signal to comply with the subsampling algorithm. In case 
that the subsampling reduces to a half the effective number of samples of the data of a videoline, the band will also need 
to be reduced to a half. Such a necessary filtering is performed by the Antialiasing Filter whose functional scheme is 
45 given in Fig. 5. The filtering is performed by generating the sum of the products between appropriate coefficients and 
adjacent pixels. The T registers may be D-f lip/flops that store these adjacent pixels for the time necessary to the filter 
to output the filtered pixel. These adjoining pixels are multiplied by preestablished coefficients and added in order to 
produce the output datum. 

By referring to Fig. 7, the ADPCM encoder block comprises a 64*8 bit buffer (block buffer) for the acquisition of the 
so l_DCT input data. A dedicated circuit (Variance Estimator) calculates the average pels value of each block of the l_DCT 
input data and the average of the sum of the absolute values of the differences between each pel of the l_DCT data 
block. With such parameters it is possible to assess the variance of the input data (pels) block. 

Figures 8 and 9 show a detailed functional scheme of the variance prediction block according to a preferred embod- 
iment. The detailed scheme of Figures 8 and 9 of the variance predictor block makes use of a standard terminology of 
55 immediate understanding by a person skilled in the art. A further definition and description of each of the stages that 
constitute the circuital block of the variance estimation is not considered necessary for a total comprehension of the 
architecture of the present invention. 

The ROM (Read Only Memory) block may be composed by 56 rows being each of 8 columns (8 bit) as indicated in 
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The ADPCM compression method is applied to each block into which the picture is decomposed through the fol- 
lowing operations: 

Selecting and coding an appropriate quantizer in the digital stream. 
5 - Coding of the first pixel of the block. 

Decorr elating, quantizing and coding of all the remaining pixels of the block. 

The various steps and the circuit architecture that carry out these operations are hereinbelow singularly described: 

10 1) Selection and coding of the quantizer. 

It is well documented that the distortion introduced by the process of quantization may be reduced if the set of 
quantization values Q(k) is calculated by taking into account the energy of the signal to be quantized. It is also known 
that different portions of a digital picture may present very different energy values. The present method defines the 
is whole of the values Q(k) relative to each block as a function of the energy of the block itself as follows: 

* The whole of the values Q1 (k)k=1 utilized in the case of unitary energy are known both to the coder and to 
the decoder; 

The U energy of the block is estimated and coded in the digital stream; 
20 * The values Q(k) effectively used by the block are calculated as: 
Q(k) =Q1(k)*U; k = 1, .... L 

An estimation of the block energy may be made in a relatively simple way by hypothesizing a Laplacian statistic of 
the prediction error. Indeed, in this case the energy may be calculated by multiplying by the square root of two the mean 

25 of the absolute values of the block prediction errors. The coding of the energy may be simply done by scaling in terms 
of the maximum value and by representing the result on a K number of bits, so to basically realize a uniform quantiza- 
tion. In selecting the quantizer of the prediction errors it is also necessary to take into consideration the peak value of 
the errors of quantization, because in the case of large prediction errors it might occur that the peak restitution value of 
the quantizer, according to the scheme shown hereinbelow, be too small. Thus, simultaneously to the calculation of the 

30 variance, the peak values of the prediction for the first column error are also calculated, within which, large prediction 
errors are likely to occur because of the greater distance among the lines of a field during the interlaced scanning, and 
for each group of G consecutive horizontal lines (i.e. G=2). A bit is added to the coding of each of these groups of pixels 
in order to signal the event of an excessive peak of prediction error, and as a result of it, the choice of a quantizer that 
corresponds to a 2*U energy in the case of a pair of rows and to 4*U in the case of the first column. 

35 A circuit architecture as that illustrated in details in Figures 8 and 9 may be used for calculating this variance esti- 
mation. 

2) Coding of the first Pixel of the block 

40 By referring to the scheme of Fig. 7, the first pixel of the block, previously indicated as P(1 , 1 ), is not subject to any 
sort of prediction, thus it is coded according to its original resolution by way of B bits. 

3) Decorrelation. quantization and coding of all the other pixels of the block 

45 By referring to the scheme of Fig. 7, for each pixel of the block, the pixel P* as previously defined will be adopted as 
the predictor. It should be noticed that this predictor, according to the scanning order, has already been quantized and 
reconstructed, and therefore is not taken from the original picture. This permits a better control of the picture's quality, 
coherently with known ADPCM techniques. 

Fig. 7 shows a circuit where, besides giving a general view of the encoder, also provides details of the prediction 
so and quantization loop of single pixels. The calculation of the prediction error is carried out in terms of modulus and sign. 
This permits to simplify the quantization, by halving the number of levels upon which the quantization operates. Indeed, 
it is known that the statistics of the prediction error is symmetric about the zero. 
Figures 10 and 1 1 illustrate a circuit embodiment of the quantizer. 

The scheme of Fig. 1 0 shows the architecture used for generating the seven threshold values SO. S1 , S2, S3, S4, 

55 S5 and S6 that represent the arithmetic mean of the restitution values T0 T7. In particular, the mean is calculated 

among adjacent restitution values (i.e. S2 = T2 + T3) and this result is not divided by two to maintain full accuracy. Of 
course this is compensated by multiplying by two the M err M value of the scheme of Fig. 1 1 that is in fact represented with 
9 bits (i.e. 1 sign bit is added) rather than with 8 bits. 
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(2 * 16 * 8 * 8) + (2 * 8 * 8 * 8) = 3.072 bits 
luma chroma 



In each PAL picture there are 1 .620 macroblocks: 
3.072 * 1 .620 = 4976640 bits 

io It is known that the chrominance signal has a lower content of information, presenting a band restricted to the low- 
est spatial frequencies. This implies a greater predictability of the chrominances themselves, that is, a greater efficiency 
of the ADPCM compression. By considering a 3 bit/pixel compression for the luminance and a 2 bit/pixel for the chromi- 
nance we would then obtain: 



15 



20 



25 



40 



45 



(2 * 208) + (144 * 2) = 704 bits 
luma chroma 



Therefore, each frame occupies: 

704* 1620 = 1.140.480 bits 
The macroblock compression factor so obtained is equal to 4.36. 

EXAMPLE OF APPLICATION TO AN MPEG DECODER 



By taking into account the above relationships it is possible to reach the target for a reduction to 4 Mbits of the video 
memory register by assuming such compression of the MPEG decompressed I and P pictures. 
30 This result is attained by first quincunx subsampling and then ADPCM recompressing the I and P pictures after 
MPEG decompression and before they are stored in the external memory. They will be then decompressed when read 
from the external memory as shown in Fig. 1 . 

The compression is applicable to an 16*8 block output from the l-DCT and from motion compensation, according 
to an adaptive DPCM scheme. In particular, in the considered example, for the 16*8 blocks of luminance a 3 bits com- 
35 pression is selected, whereas for the 8*8 blocks of chrominance a 2 bits compression is selected. 
Thus, for the PAL format case the memory requisite is as follows: 



1.835008+835.584+1.140.480+1 .140.480=4.951 .552 bits H-722Mbrts) 
I I 

| compressed and subsampled P buffer 

I compressed and subsampled I buffer 



For a NTSC format the requisite would then be of 4.571 .392 bits. 

An illustration of the processes of subsampling/upsampling and ADPCM compression/decompression is illustrated 
in Figures 14 to 16. 

so The MPEG decompressed digital image following to the inverse cosine discrete transform is in macroblock format, 
constituted by 16*16 pels for the luminance component and by two 8*8 pels for each luminance component U and V. 

These blocks, as shown by the block diagram of Rg. 3, are bufferized and broken down into two blocks of 16x16 
pels for the case of the luminance component. 

Each of these (for the luminance alone) is subsampled according to a grid commonly referred to as quincunx, as 
55 shown in Fig. 14. 

A new block of 8*8 pels is so obtained. This block feeds the ADPCM Encoder block of Rg. 3, according to the scan- 
ning path shown in Fig. 14 by the dashed line. 

Thereafter the ADPCM compression take place as already described above. 
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ing to an adaptive pulse code modulation scheme (ADPCM); 

storing the data relative to the I and P pictures in the form of subsampled and recompressed blocks so coded 
in the respective buffers of the video memory; 

5 

decoding the stored data relative to subsampled and recompressed blocks of pels relative to said I and P pic- 
tures; 

decompressing said data relative to said blocks of pels according to said adaptive differential pulse code mod- 
w ulation scheme (ADPCM) 

upsampling said decoded and decompressed data by reconstructing blocks of pels sending them to the con- 
version circuit macroblock to raster scan. 

is 2. The method according to claim 1 , characterized in that the amount of video memory requisite reduction is modula- 
ble by enabling said ADPCM recompression algorithm and coding before the writing of data in said video memory 
and of decoding and ADPCM decompression of the same data read from the video memory to realize a first reduc- 
tion of the video memory requisite and enabling said data subsampling algorithm before said ADPCM recompres- 
sion and said median upsampling of the ADPCM decompressed data to implement a maximum reduction of the 

20 memory requisite. 

3. MPEG video decoder, interfacing with a control bus and a data processing bus of the video pictures to be written 
and read in respective buffers external to the "core" of the video decoder which comprises a first 1 irst-in-f irst-out" 
(FIFO) buffer for data acquisition and writing of the compressed data in a first bit buffer of an external DRAM mem- 

25 ory, a Start Code Detector circuit synchronized by a controller (controller), a bidirectional buffer (I/O unit) for on 
screen display data (OSD). a variable length decoder (VLD) of the compressed data input stream (bitstream). an 
MPEG decompression block (pipeline-RLD, LQUANT IJDCT, Predictor Construction) of the data decoded by said 
variable length decoder, comprising a "run length" decoder, an inverse quantizer circuit, an inverse discrete cosine 
transform processor, a "predictor" generating circuit, a "MACROBLOCK SCAN TO RASTER SCAN" conversion cir- 

30 cuit for a current B-picture upstream of a display unit, characterized in that further comprises 

a bufferization and quincunx subsampling circuit of the data output by said inverse cosine transform processor; 

a coding and recompression according to an adaptive differential pulse code modulation scheme circuit (MR 
35 encoder), that recompresses the data relative to at least the I and P pictures o^the MPEG algorithm and codes 

the recompressed data to be stored in the respective buffers of said memory; 

a decoding and decompressing circuit (MR decoder) of said data relative to said I and P pictures read from the 
respective buffers of said memory and generating a stream of decoded and decompressed data relative to the 
40 I and P pictures; 

a bufferization and median upsampling circuit of said decoded and decompressed data stream; 

first multiplexing means enabling/disabling said circuits of bufferization and quincunx upsampling and of said 
45 bufferization and median upsampling; 

second multiplexing means enabling/disabling said ADPCM recompressing, coding, decoding and ADPCM 
decompressing circuits; 

so means capable of implementing the selections of said enabling/disabling states of said multiplexer via com- 

mand signals sent through said control bus. 
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